59 research outputs found

    UPGRO Hidden Crisis Research consortium: unravelling past failures for future success in rural water supply: initial project approach for assessing rural water supply functionality and levels of performance

    Get PDF
    The new Sustainable Development Goals (SDGs) set a much stronger focus on sustainability and performance of water services, and have highly ambitious goals to achieve universal access to safe and reliable water for all by 2030 (UN 2013 ). Poor functionality of water points threatens to undermine progress, and a lack of knowledge for the reasons behind this make it difficult to recommend improvements and take corrective action. As a first step it is necessary to be able to reliably monitor current rates of functionality and to have a clear benchmark as to what constitutes a functional water point. Currently, there is no single accepted definition for functionality, although organisations are working towards this as a means of tracking progress towards the SDGs. This report sets out the initial work by the Hidden Crisis project to develop a framework approach to assess functionality in terms of different levels of performance, and a set of standard indicators which can be used to assess functionality. The report presents the results of a literature review examining the following questions: (1) what are the current approaches to defining functionality of hand-pump boreholes; and (2) what are the robust standards by which the functionality of a HPB, or population of HPB’s, can be assessed. From analyses of the literature we have developed preliminary guidelines and applied these to develop a preliminary framework

    Accurate Cache and TLB Characterization Using Hardware Counters

    Full text link

    Learning from the Success of MPI

    Full text link
    The Message Passing Interface (MPI) has been extremely successful as a portable way to program high-performance parallel computers. This success has occurred in spite of the view of many that message passing is difficult and that other approaches, including automatic parallelization and directive-based parallelism, are easier to use. This paper argues that MPI has succeeded because it addresses all of the important issues in providing a parallel programming model.Comment: 12 pages, 1 figur

    Predictive runtime code scheduling for heterogeneous architectures

    Get PDF
    Heterogeneous architectures are currently widespread. With the advent of easy-to-program general purpose GPUs, virtually every re- cent desktop computer is a heterogeneous system. Combining the CPU and the GPU brings great amounts of processing power. However, such architectures are often used in a restricted way for domain-speci c appli- cations like scienti c applications and games, and they tend to be used by a single application at a time. We envision future heterogeneous com- puting systems where all their heterogeneous resources are continuously utilized by di erent applications with versioned critical parts to be able to better adapt their behavior and improve execution time, power con- sumption, response time and other constraints at runtime. Under such a model, adaptive scheduling becomes a critical component. In this paper, we propose a novel predictive user-level scheduler based on past performance history for heterogeneous systems. We developed sev- eral scheduling policies and present the study of their impact on system performance. We demonstrate that such scheduler allows multiple appli- cations to fully utilize all available processing resources in CPU/GPU- like systems and consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.Postprint (published version

    Adaptive Loop Tiling for a Multi-cluster CMP

    No full text

    Exploring the optimization space of dense linear algebra kernels

    No full text
    Abstract. Dense linear algebra kernels such as matrix multiplication have been used as benchmarks to evaluate the effectiveness of many automated compiler optimizations. However, few studies have looked at collectively applying the transformations and parameterizing them for external search. In this paper, we take a detailed look at the optimization space of three dense linear algebra kernels. We use a transformation scripting language (POET) to implement each kernel-level optimization as applied by ATLAS. We then extensively parameterize these optimizations from the perspective of a general-purpose compiler and use a standalone empirical search engine to explore the optimization space using several different search strategies. Our exploration of the search space reveals key interaction among several transformations that must be considered by compilers to approach the level of efficiency obtained through manual tuning of kernels.

    GPU vs FPGA: A Comparative Analysis for Non-standard Precision

    No full text
    Abstract. FPGAs and GPUs are increasingly used in a range of high performance computing applications. When implementing numerical al-gorithms on either platform, we can choose to represent operands with different levels of accuracy. A trade-off exists between the numerical ac-curacy of arithmetic operators and the resources needed to implement them. Where algorithmic requirements for numerical stability are cap-tured in a design description, this trade-off can be exploited to opti-mize performance by using high-accuracy operators only where they are most required. Support for half and double-double floating point repre-sentations allows additional flexibility to achieve this. The aim of this work is to study the language and hardware support, and the achievable peak performance for non-standard precisions on a GPU and an FPGA. A compute intensive program, matrix-matrix multiply, is selected as a benchmark and implemented for various different matrix sizes. The re-sults show that for large-enough matrices, GPUs out-perform FPGA-based implementations but for some smaller matrix sizes, specialized FPGA floating-point operators for half and double-double precision can deliver higher throughput than implementation on a GPU
    • …
    corecore